PTree: pattern-based, stochastic search for maximum parsimony phylogenies
نویسندگان
چکیده
Phylogenetic reconstruction is vital to analyzing the evolutionary relationship of genes within and across populations of different species. Nowadays, with next generation sequencing technologies producing sets comprising thousands of sequences, robust identification of the tree topology, which is optimal according to standard criteria such as maximum parsimony, maximum likelihood or posterior probability, with phylogenetic inference methods is a computationally very demanding task. Here, we describe a stochastic search method for a maximum parsimony tree, implemented in a software package we named PTree. Our method is based on a new pattern-based technique that enables us to infer intermediate sequences efficiently where the incorporation of these sequences in the current tree topology yields a phylogenetic tree with a lower cost. Evaluation across multiple datasets showed that our method is comparable to the algorithms implemented in PAUP* or TNT, which are widely used by the bioinformatics community, in terms of topological accuracy and runtime. We show that our method can process large-scale datasets of 1,000-8,000 sequences. We believe that our novel pattern-based method enriches the current set of tools and methods for phylogenetic tree inference. The software is available under: http://algbio.cs.uni-duesseldorf.de/webapps/wa-download/.
منابع مشابه
Sufficient Conditions for Two Tree Reconstruction Techniques to Succeed on Sufficiently Long Sequences
The reconstruction of evolutionary trees (phylogenies) from DNA sequence data is a central problem in biology. We describe simple sufficient conditions for two tree reconstruction methods (maximum parsimony and maximum compatibility) to correctly reconstruct a tree when applied to sufficiently many sequence sites generated under a simple stochastic model.
متن کاملReconstructing Phylogenies From Nucleotide Pattern Probabilities: A Survey and some New Results
The variations between homologous nucleotide sequences representative of various species are, in part, a consequence of the evolutionary history of these species. Determining the evolutionary tree from patterns in the sequences depends on inverting the stochastic processes governing the substitutions from their ancestral sequence. We present a nl.J.mber of recent (and some new) results which al...
متن کاملComparing Different Operators and Models to Improve a Multiobjective Artificial Bee Colony Algorithm for Inferring Phylogenies
Maximum parsimony and maximum likelihood approaches to phylogenetic reconstruction were proposed with the aim of describing the evolutionary history of species by using different optimality principles. These discrepant points of view can lead to situations where discordant topologies are inferred from a same dataset. In recent years, research efforts in Phylogenetics try to apply multiobjective...
متن کاملLVB: parsimony and simulated annealing in the search for phylogenetic trees
UNLABELLED The program LVB seeks parsimonious phylogenies from nucleotide alignments, using the simulated annealing heuristic. LVB runs fast and gives high quality results. AVAILABILITY The software is available at http://www.rubic.reading.ac.uk/lvb/ SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION may be downloaded from http://www.rubic.reading.ac.uk/~daniel/
متن کاملEstimating phylogenies under maximum likelihood: A very large-scale neighborhood approach
A basic problem in evolutionary genetics is the estimation of phylogenies among DNA or protein sequences. This problem is known to be NP-hard under several optimality criteria used for evaluating the quality of phylogenies. Consequently, one can reasonably search for optimal phylogenies only for datasets of small sizes such that the ever increasing number of molecular data accumulating in publi...
متن کامل